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Abstract — Multidimensional  scaling  (MDS)  is  a  class  of 
projective  algorithms  traditionally  used  to  produce  two- 
or  three-dimensional  visualizations  of  datasets  consisting 
of  multidimensional  objects  or  interobject  distances.  Re¬ 
cently,  metric  MDS  has  been  applied  to  the  problems 
of  graph  embedding  for  the  purpose  of  approximate 
encoding  of  edge  or  path  costs  using  node  coordinates 
in  metric  space.  Several  authors  have  also  pointed  out 
that  for  data  with  an  inherent  hierarchical  structure, 
hyperbolic  target  space  may  be  a  more  suitable  choice  for 
accurate  embedding  than  Euclidean  space.  In  this  paper 
we  present  the  theory  and  the  implementation  details  of 
MDS-PD,  a  metric  MDS  algorithm  designed  specifically 
for  the  Poincare  disk  model  of  the  hyperbolic  plane.  Our 
construction  is  based  on  an  approximate  hyperbolic  line 
search  and  exemplifies  some  of  the  particulars  that  need  to 
be  addressed  when  applying  iterative  optimization  methods 
in  a  hyperbolic  space  model.  MDS-PD  can  be  used  both 
as  a  visualization  tool  and  as  an  embedding  algorithm.  We 
provide  several  examples  to  illustrate  the  utility  of  MDS- 
PD. 

Index  Terms — dimensionality  reduction,  hyperbolic  em¬ 
bedding,  hyperbolic  MDS,  network  graph,  steepest  descent, 
visualization 

I.  INTRODUCTION 


If  the  target  space  dimension  is  2  or  3,  the  output  con¬ 
figuration  can  be  graphically  represented,  which  makes 
MDS  a  visualization  tool  seeking  to  preserve  the  input 
distances  as  faithfully  as  possible,  thus  clustering  the 
objects  in  the  target  space  by  similarity.  More  generally, 
for  a  given  dimension  d,  metric  multidimensional  scaling 
can  be  used  to  embed  an  input  set  of  dissimilarities  of 
the  original  objects  into  a  d-dimensional  metric  space. 

In  order  to  apply  MDS,  several  design  decisions  must 
be  made.  One  first  needs  to  choose  a  target  metric  space 
of  appropriate  dimension  d  and  a  corresponding  distance 
function.  An  objective  function  should  be  chosen  so 
that  it  provides  a  suitable  measure  of  inaccuracy  for  a 
given  embedding  application.  If  the  objective  function 
is  nonlinear  but  satisfies  some  mild  general  conditions 
(smoothness),  a  numerical  optimization  method  can  be 
chosen  (e.g.  Nocedal  and  Wright  |1999)  for  the  imple¬ 


mentation  of  the  embedding  algorithm. 

The  Euclidean  plane  is  the  most  common  choice 
of  target  space  for  visualization  applications  due  to 
its  simplicity  and  intuitiveness.  Spherical  surface  can 
be  used,  for  example,  to  avoid  the  edge  effect  of  a 
planar  representation  (Cox  and  Cox  1 1 99 1 1.  In  general, 


MDS  on  curved  subspaces  of  Euclidean  space  can  be 


Metric  multidimensional  scaling  (MDS)  (Carroll  and 

viewed  as  MDS  in  a  higher  dimensional  Euclidean  space 

Arabie  1980 ;  |De  Leeuw  and  Heiser  1982;  Cox  and  Cox 

constrained  to  a  particular  surface  (Bender  and  Weeks 

2000;  Borg  and  Groenen  2005)  is  a  class  of  algorithms  1978,  Bloxom  1978). 

that  take  as  input  some  or  all  of  the  inter-object  distances 
( pair  dissimilarities )  for  n  objects  and  produce  as  output 
a  point  configuration  of  n  points  specified  by  their 
coordinates  in  a  chosen  cf -dimensional  target  space.  The 
goal  is  to  return  the  point  configuration  whose  inter¬ 
point  distances  in  the  d-dimensional  space  match  as 
closely  as  possible  the  original  input  distances.  Usually, 
this  goal  is  pursued  by  minimizing  a  scalar  badness- 
of-fit  objective  function  defined  for  an  arbitrary  /7-point 
configuration  in  the  target  space;  ideally,  the  output  of  an 
MDS  algorithm  should  be  the  configuration  that  achieves 
the  global  minimum  of  the  objective  function. 


The  use  of  metric  MDS  in  the  hyperbolic  plane  was 
proposed  in  the  context  of  interactive  visualization  by 
Walterj  ([2004}|,  inspired  by  the  focus  and  context  hyper¬ 


bolic  tree  viewer  of  Lamping  and  Rao  ( 1994).  The  use  of 


MDS  in  (Walter  2004)  inherently  generalized  the  appli¬ 
cability  of  the  method  from  tree  structures  to  continuous¬ 
valued  multidimensional  data  or  pair  distances.  It  was 
noted  that  the  curious  property  of  exponential  growth 
of  the  “available  space”  in  the  hyperbolic  models  as 
one  moves  towards  infinity,  makes  planar  hyperbolic 
embedding  a  suitable  choice  for  both  hierarchical  and 
high-dimensional  data.  The  adequacy  of  the  hyperbolic 
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spaces  for  embedding  of  various  data  was  also  studied 
and  confirmed  in  the  contexts  of  network  embedding  for 


path  cost  estimation  ( 

Shavitt  and  Tankel  2008 )  and  rout- 

ing  ( Kleinbcri 

2007}  Krioukov  et  al. 

2009; 

Cvetkovski 

and  Crovella 

1009;  Papadopoulos  et  a 

.  2010 

). 

A  least  squares  formulation  of  MDS,  to  be  used 
in  conjunction  with  an  iterative  numerical  method  for 
unconstrained  optimization  was  proposed  by  Sammon 


(1969).  The  objective  function  therein  (the  Sammon 
stress  criterion)  was  defined  as  a  normalized  sum  of  the 
squared  differences  between  the  original  dissimilarities 
and  the  embedded  distances  of  the  final  point  configu¬ 
ration.  To  minimize  this  function,  Sammon  proposed  a 
descent  method  with  step  components  calculated  using 
the  first  two  component  derivatives  of  the  objective  func¬ 
tion.  Walter  (2004)  adopted  Sammon’s  badness-of-fit 


measure  for  hyperbolic  MDS  but  observed  that  applying 
Sammon’s  iterative  procedure  in  the  Poincare  disk  (PD) 
using  exact  derivatives  is  difficult  due  to  the  complicated 
symbolic  expressions  of  the  second  derivative  of  the 
hyperbolic  distance  function  in  this  model.  Subsequently, 
the  Levenberg-Marquardt  least  squares  method  was  ap¬ 
plied  in  (Walter]  2004),  using  only  first-order  derivatives 
for  the  optimization,  but  the  details  of  applying  this 
iterative  method  in  the  Poincare  disk  were  not  elaborated. 

In  this  paper  we  present  MDS-PD,  a  metric  multidi¬ 
mensional  scaling  algorithm  using  the  Poincare  disc  (PD) 
model.  MDS-PD  is  based  on  a  steepest  decent  method 
with  line  search.  We  show  the  details  of  the  steepest 
descent  along  hyperbolic  lines  in  the  PD  and  present  a 
suitable  approximate  hyperbolic  line  search  procedure. 
MDS-PD  is  applicable  in  its  own  right;  additionally,  its 
construction  also  illustrates  some  of  the  specifics  that 
need  to  be  considered  when  transferring  more  sophis¬ 
ticated  iterative  optimization  methods  to  the  PD  or  to 
other  hyperbolic  models.  Based  on  this  development,  we 
show  the  particular's  of  a  numerical  implementation  of 
MDS-PD  and  use  this  algorithm  to  carry  out  numerical 
experiments  on  several  datasets  using  least  squares  cri¬ 
terion  functions.  Our  simulation  results  indicate  that  the 
performance  of  a  steepest  descent  method  for  minimizing 
a  least  squares  objective  on  large  configurations  in  the 
PD  is  notably  dependent  on  the  line  search  method  used, 
and  that  binary  hyperbolic  line  search  provides  markedly 
better  convergence  and  cost  properties  for  MDS-PD 
compared  to  more  sophisticated  or  precise  methods. 

The  rest  of  this  paper  is  organized  as  follows.  Section 
[ITJconsohdates  the  notation  and  concepts  from  hyperbolic 
geometry  that  will  be  used  throughout,  and  proceeds  to 


develop  two  of  the  building  blocks  of  MDS-PD  -  steep¬ 
est  descent  in  the  PD  and  a  corresponding  hyperbolic 


line  search.  Section  III  considers  particular  objective 
functions  and  gradients  and  further  discusses  properties 
and  applicability  of  multidimensional  scaling  in  the  PD. 


Section  IV  presents  illustrative  examples  of  MDS-PD 


operation  in  the  context  of  synthetic  as  well  as  real-world 
input  data.  Related  work  is  discussed  in  Section  [V]  and 


concluding  remarks  are  given  in  Section  VI 


II.  A  DESCENT  METHOD  FOR  THE  POINCARE 

DISK 


We  start  this  section  by  introducing  our  notational 
conventions  and  establishing  some  properties  of  the 
Poincare  disk  that  will  be  useful  in  the  subsequent  devel¬ 
opment.  This  will  allow  us  to  formally  define  a  Poincare- 
disk  specific  descent  method  and  a  binary  hyperbolic 
line  search  algorithm,  that  together  make  a  simple,  yet 
efficient  iterative  minimization  method  for  this  model  of 
the  hyperbolic  plane. 


A.  Preliminaries 

The  Poincare  Disk  model  of  the  hyperbolic  plane  is 
convenient  for  our  considerations  since  it  has  circular 
symmetry  and  a  closed  form  of  the  inter-point  distance 


formula  exists  (Anderson  2007).  We  will  be  using  com¬ 


plex  rectangular  coordinates  to  represent  the  points  of 
the  hyperbolic  plane,  making  the  PD  model  a  subset  of 
the  complex  plane  C: 


1  =  {z  e  C  |  |z|  <  1} . 


(1) 


The  hyperbolic  distance  between  two  points  Zj  and  Zk 
in  D  is  given  by 

£  •  — 

dn(zj,Zk)  =2atanh-^ — =L,  (2) 

| 1  —  ZjZk  | 

where  z  denotes  the  complex  conjugate. 

Mobius  transformations  are  a  class  of  transformations 
of  the  complex  plane  that  preserve  generalized  circles. 
The  special  Mobius  transformations  that  take  D  to  D  and 
preserve  the  hyperbolic  distance  have  the  form 

T  (z)  =  V — _ ,  a,Z?G  C,  |«|2  —  |Z?|2  7^  0.  (3) 

bz  + a 

Given  a  point  zo  £  ©  and  a  direction  y  G  C  with 
|  y|  =  1 ,  we  can  travel  a  hyperbolic  distance  .v  >  0  along 
a  hyperbolic  line  starting  from  zo  in  the  direction  y, 
arriving  at  the  point  z'0- 
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Lemma  1.  For  zo  G  ID),  ye  C  with  \y\  =  1,  and  s  >  0, 
the  point 

,  _  ytanh  |  +zo 
<0  zo  ytanh  §  +  1 

(i)  belongs  to  the  hyperbolic  ray  passing  through  zo  and 
having  direction  y  at  zo,  and 

(ii)  d0(zo,z'0 )  =5. 


Proof:  Given  a  point  zo  e  D  and  a  direction  y  e  C 
with  \y\  =  1,  the  hyperbolic  ray  in  D  passing  through 
Zo  and  having  direction  y  at  zo  can  be  parametrized  by 
re  [0, 1)  as 


fir) 


ry+zo 
ryzd  + 1 


(4) 


Noting  that  (|4]),  seen  as  a  function  of  z  =  ry: 


T{z) 


z  +  zo 

ZZ0+  1 


is  a  Mobius  transformation  taking  D  to  D  and  preserving 
hyperbolic  distances,  we  see  that 


do{f{r),zo)  =  do  (0,  r )  =  In 


1+r 
1  —  r 


whence  it  follows  that  moving  zo  along  a  hyper¬ 
bolic  line  in  the  direction  y  by  a  hyperbolic  distance 
s  =  In ((1  +  r)  /  (1  —  r))  we  arrive  at  the  point  z!q  = 
/  (tanh  § ) . 

■ 

Next,  we  introduce  some  of  the  notation  that  will  be 
used  throughout. 

•  Let  the  point  configuration  at  iteration  t  =  1,2, . . .  T 
consist  of  n  points  in  the  Poincare  disk  D 


Zj(t), 

represented  by  their  rectangular  coordinates: 

Zj  0)  =yj,i  {t)  +  iyj,2{t),  i=  x/3!,  yjA,yjp€R 

with  |  Zj  (f)|  <  1. 

•  We  also  use  vector  notation  to  refer  to  the  point 
configuration 

z(t)=[ziO)  Z2 (t)  ...  zn{t)  }T  =  y\  +  iyi  = 
=  [yt,i(0  y2,\{t)  yn,i(t)]T  + 
i[yi,2{t)  y2,i(t)  ...  yn,2(t)  ]T , 

where  [-]r  in  this  work  indicates  the  real  matrix 
transpose  (to  be  distinguished  from  the  complex 
conjugate  transpose.) 

•  The  distance  matrix  for  a  given  point  configuration 
z  is  the  real  valued  symmetric  matrix  D  (z)  = 


[djk] n/n  whose  entry  dp  is  the  hyperbolic  distance 
between  points  zj  and  Zk  in  the  configuration  z: 

djk  =  do  ( Zj,Zk )  ■ 

•  The  dissimilarity  matrix  A  =  [djk]nxn  is  a  symmet¬ 
ric,  real- valued  matrix  containing  the  desired  inter¬ 
point  distances  of  the  final  output  configuration  (the 
dissimilarities).  The  diagonal  elements  are  5;/  =  0 
and  all  other  entries  are  positive  real  numbers: 
djk  =  5kj  >  0  for  j  k. 

•  The  indicator  matrix  I  =  [ljk]nxn  is  a  symmetric 
0-1  matrix,  used  to  allow  for  missing  dissimilarity 
values.  The  entries  of  I  corresponding  to  missing 
values  in  A  are  set  to  0.  All  other  entries  are  set  to 
1. 

•  The  weight  matrix  W  =  [wjk]  n/n  is  a  symmetric, 
real-valued  matrix  introduced  to  enable  weighting 
of  the  error  terms  for  individual  pairs  of  points  in 
the  objective  function  sum.  For  convenience,  wjk 
corresponding  to  missing  dissimilarities  are  set  to 
some  finite  value,  e.g.  1. 

•  The  objective  function  to  be  minimized  is  the 
embedding  error  function  E,  =  Et  (z.  A.  W.  I)  that, 
given  the  sets  of  dissimilarities  and  weights,  as¬ 
sociates  to  a  configuration  z  an  embedding  error 
Et.  An  example  of  an  error  function  is  the  sum  of 
relative  squared  differences 

”  "  fdik(t)-8i  k\2 

E,{ z,A,W,I)  =  £  £  wpIjA^fd - f  \  . 

j=lk=i+ 1  V  djk  J 

(5) 

The  objective  function  can  optionally  be  normalized 
per  pair  by  dividing  with  the  number  of  summands 
( n 2  —  n)  /2. 


B.  Descent  in  the  Poincare  disk 

Given  a  configuration  of  points  z,  matrices  A,  W, 
and  I,  the  distance  function  do(zj,Zk),  and  an  objective 
function  E  (z,  A,W,I),  define 


-  dE  |  dE  - 
dyi,i  dy ij2 

81 

V£dif 

dE  |  ■  dE 
dyi,i  dyi.i 

= 

82 

dE  |  ■  dE 
-  4y„ii  ~rldy„t2  . 

8n 

According  to  Lemma[I]  moving  the  points  z i ■  z«  of 
the  configuration  z  along  distance  realizing  paths  in  the 
PD  defined  respectively  by  the  directions  —  g\ , . . . ,  —  gn 
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Figure  1.  An  example  of  moving  a  4-point  configuration  in  a  given 
(descent)  direction  along  distance  realizing  paths  of  the  Poincare  disk 


at  z  (Fig.  [T])  will  result  in  configuration  t!  with  points 

-r8j+Zj 


/ 

zj  = 


(7) 


~rSjZj  +  1 

where  r  >  0  is  the  step-size  parameter  which  determines 
the  hyperbolic  distances  sj  traveled  by  zf 

1+r\gj\ 


s:  =  In 


1 


\gj\ 


(8) 


The  PD  model  (|Tj)  implies  the  constraints  \zj\  <  1  for 
the  point  coordinates.  Still,  the  optimization  on  the  PD 
can  be  viewed  as  unconstrained  by  observing  that  the 
constraints  z'j  <  1  will  not  be  violated  while  moving  a 
configuration  z  in  D  if  the  distances  sj  traveled  by  each 
point  are  always  kept  finite,  i.e. 


sM  =  vaaxjSj  <  oo.  (9) 

Since  ([9j>,  according  to  <[8j),  corresponds  to  rmaxj  |gy|  < 
1,  we  have  the  constraint  on  r 
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When  implementing  iterative  descent  minimization 
methods  with  line  search  in  the  Poincare  disk,  it  is 
important  to  specify  a  hyperbolic  distance  window  sm 
along  the  descent  lines  where  the  next  configuration  will 
be  sought.  In  this  case  the  corresponding  value  of  the 
parameter  r  is 

m  =  u4p  -tanh^  <  t— 'j— .  (10) 

g  2  g 

Since  the  Poincare  disk  model  is  conformal,  following 
the  direction  — g  (the  opposite  of  ([6]))  corresponds  to  the 
steepest  descent  optimization  method.  Moving  the  point 
configuration  along  hyperbolic  lines  (distance  realizing 


paths),  on  the  other  hand,  ensures  that  the  steepest 
descent  direction  is  exhausted  most  efficiently  given  the 
current  information  about  the  objective  function. 

C.  A  Steepest  Descent  Algorithm  for  the  PD 
Figure  [2]  shows  a  framework  for  MDS-PD. 

Algorithm  MDS-PD 
Input  data: 

an  initial  configuration  z  ( 1 ) 
the  dissimilarities  A,  weights  W,  indicators  I 
Input  parameters: 
an  objective  function  E  (z,  A,W,I) 
the  stopping  tolerances  Ee,  E\e,  £g,  £r,  Tm 
Output: 

a  final  point  configuration  z  (T) 
a  final  embedding  error  Ej 


@i) 


Initialize: 

t  i —  1 ;  sm  i —  10;  E—  \  i —  °° ;  z  i —  z  ( 1 ) ;  ... 

Loop: 

£<-E(z,A,W,I);  g<— V£(z,A,W,I); . {J2 

rM  ■*“  li'tanhf; . IP 

Break  if 

E  <  Ee 

or  E-i  —E<  E/^e 
or  ||g|L  <  £g 
or  rM  <  Er 
or  t  >  Tm\ 

E~  1  4—E\ 

r  •(— FlypLineSearch(£'(z,  A,  V  5} 

V;G{l..n}, 
t<-t  + 1; 

Return  z  (T)  4—  z  and  Ej  4—  E  (z,  A,  W,I). 


Figure  2.  MDS-PD 


The  input  data  of  MDS-PD  consists  of  the  initial  con¬ 
figuration  z  ( 1),  and  the  input  metric:  the  dissimilarities 
A  with  the  associated  weights  W  and  the  indicators  of 
missing  dissimilarities  I.  The  input  parameters  are  the 
objective  error  function  2?(z,A,W,I)  and  the  stopping 
tolerances  Ee,  Eae,  %,  £r,  and  Tm-  The  output  of  MDS- 
PD  consists  of  the  final  point  configuration  z  (T)  and  its 
associated  embedding  error  Ej  =  E  (z  (T) ,  A,  W,I). 

The  initialization  {[2]  1 }  sets  the  maximum  hyperbolic 
distance  sm  that  can  be  traveled  by  any  point  of  the 
configuration,  and  the  previous  value  of  the  embedding 
error  E  \ . 

Each  iteration  starts  by  determining  the  gradient  of 
the  error  in  the  current  configuration  {[2]2 }  and  the 
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corresponding  window  {[2]  3}  for  the  parameter  r 

(Eq.  (fT0|».  A  hyperbolic  line  search  (described  in  Sec. 
II-D[)  is  performed  {|2]5}  in  the  direction  of  the  steepest 
descent  — g  of  the  embedding  error  and  the  resulting  step- 
size  parameter  r  is  used  in  {[2]  6}  to  arrive  at  the  next 
configuration  as  in  ([7]). 

Several  stopping  criteria  are  used  (line  |[2j4})  to 
terminate  the  search.  Ideally,  the  algorithm  exits  when 
the  embedding  error  is  close  to  0  (E  <  £e).  Termination 
also  occurs  in  the  cases  when  the  error  decreases  too 
slowly  (E  i  —E<  Eae),  or  when  the  gradient  or  the  step¬ 
ping  parameter  become  too  small  (HgH^  <  £g,  rM  <  er). 
Finally,  Tm,  the  maximum  allowed  number  of  iterations, 
is  used  as  a  guard  against  infinite  looping. 

The  line  search  subprogram  used  in  {[2]5 }  is  described 
next. 


Figure  3.  Acceptable  step  lengths  for  inexact  line  search  obtained 
from  the  sufficient  decrease  condition. 


D.  Approximate  Hyperbolic  Line  Search 

An  exact  line  search  could  be  used  in  line  {[2j5 }  (Fig. 
[2])  to  determine  a  value  for  the  step  size  r  such  that  the 
corresponding  new  configuration  {[2]6 }  achieves  a  local 
minimum  of  the  embedding  error  along  the  search  path 
with  tight  tolerance: 


;argmin  re[0Mq{r). 


(11) 


solution  configuration.  A  popular  approach  to  defining 
sufficient  decrease  is  to  define  the  “roof”  function 

A  (r)  =  q(0)  +p  ■  q'  (0)  ■  r,  0<p<l  (12) 

which  is  a  line  passing  through  (0,  <7 (0))  and  having  a 
slope  which  is  a  fraction  of  the  slope  of  q(r)  at  r  =  0. 
With  this  function,  we  define  that  sufficient  decrease  is 
provided  by  all  values  of  r  such  that 


where  q  (r)  is  the  embedding  error  as  a  function  of  r. 

However,  increasing  the  precision  of  this  computation 
is  not  essential  to  the  convergence  performance  since  the 
steepest  descent  search  direction  is  only  locally  optimal. 
Furthermore,  exact  line  search  can  fail  to  converge  to 
a  local  minimum  even  for  a  second  degree  polynomial 
due  to  finite  machine  precision  (Frandsen  et  al.  [2004  >.  It 
is  now  accepted  in  the  numerical  optimization  literature 
that  approximate  line  search  provides  convergence  rates 
comparable  to  the  exact  line  search  while  significantly 
reducing  the  computational  cost  per  line  search.  In  fact, 


q  (r)  <  X  (r) ,  r  E  (0,  rM\ 


(13) 


the  step  calculation  used  in  Sammon  (1969)  is  a  “zero- 
iteration”  approximate  line  search,  where  the  step  size  is 
simply  guessed  based  on  the  first  two  derivatives  of  the 
error.  Conceivably,  the  simplest  inexact  step  calculation 
would  guess  the  step  size  based  only  on  the  directional 
gradient  at  the  current  configuration. 

Approximate  line  search  procedures  aim  to  reduce  the 
computational  cost  of  determining  the  step  parameter  by 
posing  weaker  conditions  on  the  found  solution:  Rather 
than  searching  for  a  local  or  global  minimizer  of  q  (r) 
on  (0 ,tm\,  a  value  is  returned  by  the  line  search  function 
as  satisfactory  if  it  provides  sufficient  decrease  of  the 
objective  function  and  sufficient  progress  toward  the 


Fig.  [3]  shows  an  example  of  acceptable  step  length 
segments  obtained  from  the  sufficient  decrease  condition 

m 

To  ensure  sufficient  progress,  we  adopt  a  binary 
search  algorithm  motivated  by  the  simple  backtracking 
approach  (e.g.  Nocedal  and  Wright  (|1999 1).  The  details 
are  given  in  Fig.  [4] 

We  start  the  line  search  with  an  initial  guess  ro  for 
the  step  size  parameter,  and  in  the  expansion  phase 
©1}  we  double  it  until  it  violates  the  window  r^ 
or  the  sufficient  decrease  condition.  In  the  reduction 
phase  {02},  we  halve  r  until  it  finally  satisfies  both  the 
window  requirement  r  <  rM  and  the  decrease  criterion 
<?(r)  <  A(r). 

We  observe  that,  when  started  at  a  point  with  nonzero 
gradient,  the  line  search  will  always  return  a  nonzero 
value  for  r.  Since  the  returned  acceptable  step  r  is  such 
that  the  step  2  •  r  is  not  acceptable,  there  will  be  a 
maximum  acceptable  point  rm  from  the  same  acceptable 
segment  as  r,  such  that  r  <  rm  <2  ■  r,  whence  r  >  rm/2. 
In  other  words,  the  returned  value  is  always  in  the  upper 
half  of  the  interval  [0,  rm\  and  we  accept  this  as  sufficient 
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Procedure  HypLineSearch 

Input  data: 

an  initial  guess  of  the  step  parameter  tq 
the  maximum  step  value  tm 
the  function  q  ( r ) 

Input  parameters: 

the  slope  parameter  p  for  the  roof  function  A  (r) ; 

Output: 

an  acceptable  step  parameter  r 

Initialize: 

r<r-  r0; 

While  r  <  t~m  and  q(r)  <  A  (r) , 


r^2-r,  . {01} 

While  r  <  rM  or  q{r)>  X  (r) , 

r<  -r/2; . {02} 

Return  r. 


Figure  4.  Line  search  procedure  for  MDS-PD 


progress  toward  the  solution,  thus  eliminating  some  more 
computationally  demanding  progress  criteria  that  would 
require  calculation  of  q'(r )  at  points  other  than  r  =  0 


or  cannot  always  return  a  nonzero  r  (cf.  Nocedal  and 
Wright  1999[  Frandsen  et  al.  2004 1. 


It  remains  to  show  how  to  calculate  the  slope  of  A  (r), 


that  is  pq'  (0)  (Eq.  12).  Given  a  configuration  z  and  a 


direction  — g  =  — VE  (z,  A,  W,I),  the  configuration  z'  as 
a  function  of  r  0  can  be  conveniently  represented  as  a 
column-vector  function 

M  (— rg,z)  (14) 

whose  j- th  entry  is  the  Mobius  transform 

-rgjZj+l 

The  associated  embedding  error  as  a  function  of  r  is  then 
4(r)=£(M(-rg,z),A,W,I),  (15) 

and  it  can  be  easily  shown  that  its  slope  is  given  by 

=  (ReM'(-rg,z))rReV£(M(-rg,z),A,W,I) 
+  (imM1  (— rg,z)}T Im VE  (M  (— rg,z) ,  A,  W,I) 


where  the  entries  of  M'(— rg,z)  are  given  by 

M’(r)  =  2Ml{r)=Sj±LzE.. 


We  thus  have  a  general  explicit  formula  for  calculating 
q'  (r)  given  a  configuration  z  and  the  corresponding 
gradient  g  of  E  at  z.  In  particular,  this  formula  can  be 
used  to  calculate  pq'  { 0),  the  slope  of  A  (r). 


III.  MULTIDIMENSIONAL  SCALING  IN  THE 

PD 


A.  Objective  Functions  and  Gradients 

The  iterative  minimization  method  presented  in  Sec. 
[n]  requires  a  choice  of  an  embedding  error  function  with 
continuous  first  derivatives.  In  this  work  we  consider  the 
least  squares  error  function 


E  =  c'Yj  52  Cjk  {d jk  ~  aSjk) 2  •  (16) 

j=ik=j+ 1 


We  note  that  (|T6|)  is  a  general  form  from  which  several 
special  embedding  error  functions  can  be  obtained  by 
substituting  appropriate  values  of  the  constants  c,  Cjk, 
and  a.  Examples  include: 

•  Absolute  Differences  Squared  (ADS) 

■A  =  52  12  wjk{^jk{djk  —  a8jk))  (17) 

j=ik=j+ 1 


Relative  Differences  Squared  (RDS) 

d  jk  a 


£  =  I  I  Wjk  (  Ijk- 

j=lk=j+l 


ad 


jk 


(18) 


S  amnion  Stress  Criterion  (SAM) 


E=—, - K - £  £  »ik 

L  iiAk  j-u-'+I 


{ljk{djk  a&jk)) 


a8 


jk 


j=lk=j+ 1 


(19) 


As  the  most  general  case  of  ([T6|),  individual  importance 
dependent  on  the  input  dissimilarities  can  be  assigned  to 
the  pairwise  error  terms  using  the  the  weights  terms  wjk. 

MDS-PD  also  requires  calculation  of  the  gradient  of 
the  error  function.  For  a  general  error  function,  closed 
form  symbolic  derivatives  may  or  may  not  exist,  and 
in  the  latter  case  one  can  resort  to  approximating  the 
gradient  using  finite  difference  calculations.  Numerical 
approximation  may  also  incur  lower  computational  cost 
than  the  formal  derivatives.  However,  the  use  of  nu¬ 
merical  derivatives  can  introduce  additional  convergence 
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problems  due  to  limited  machine  precision. 

For  the  sum  both  approaches  are  possible.  A 
symbolic  derivation  of  the  gradient  of  (fT6|),  including 
both  the  Euclidean  and  hyperbolic  cases,  can  be  easily 
carried  out  and  is  omitted  here  for  brevity.  From  the 
obtained  result,  symbolic  derivatives  of  (|T7|)-(|T9|).  as 
well  as  any  other  special  cases  derivable  from  (f]~6j)  can 
be  obtained  by  substituting  appropriate  constants. 

B.  Local  vw.  Global  Minima 

MDS-PD,  being  a  steepest  descent  method  that  termi¬ 
nates  at  near-zero  progress,  can  find  a  stationary  point 
of  the  objective  function.  In  the  least  squares  case,  if  the 
value  at  the  returned  solution  is  close  to  zero  (that  is, 
E  <  eE),  then  the  final  configuration  can  be  considered 
a  global  minimizer  that  embeds  the  input  metric  with 
no  error.  In  all  other  cases,  a  single  run  of  MDS- 
PD  cannot  distinguish  between  local  and  global  points 
of  minimum  or  between  a  minimizer  and  a  stationary 
point.  A  traditional  way  of  getting  closer  to  the  global 
minimum  in  MDS  is  to  run  the  minimization  multiple 
times  with  different  starting  configurations.  Expectedly, 
there  will  be  accumulation  of  the  results  at  several 
values,  and  the  more  values  are  accumulated  at  the  lowest 
accumulation  point,  the  better  the  confidence  that  the 
minimal  value  represents  a  global  minimum  i.e.  the  least 
achievable  embedding  error. 

C.  Input  Data  Curvature  Matching 

The  objective  functions  used  in  metric  Euclidean 
MDS  are  typically  constructed  to  be  scale-invariant  in 
the  sense  that  scaling  the  input  dissimilarities  and  the 
coordinates  of  the  output  configuration  with  the  same 
constant  factor  a  does  not  change  the  embedding  error. 
This  is  possible  for  Euclidean  space  since  the  Euclidean 
distance  function  scales  by  the  same  constant  factor  as 
the  point  coordinates:  (Y%=i(a-yjS  —  a-yks)2)1'2  =  a -dp. 
Thus,  for  example,  if  djk  is  the  Euclidean  distance,  then 
the  sums  (fi~8j)  and  (fl9|)  are  scale-invariant,  whereas  (fT7|) 
is  not.  However,  when  djk  is  the  hyperbolic  distance 
function  Q,  none  of  the  (fT7|)-([T9l)  is  scale-invariant. 
Therefore,  the  simplest  ADS  error  function  (jTTj)  may  be 
a  preferable  choice  for  reducing  the  computational  cost 
in  the  hyperbolic  case.  Nevertheless,  for  our  numerical 
experiments  we  choose  to  apply  the  S  amnion  criterion 
(|~i~9])  so  as  to  facilitate  numerical  comparison  between 
the  final  embedding  errors  for  the  Sammon  map  in  the 
Euclidean  plane  and  MDS-PD. 

The  lack  of  scale-invariance  of  the  hyperbolic  distance 
formula  ([2])  implies  an  additional  degree  of  freedom 


in  the  optimization  of  the  embedding  error  -  the  dis¬ 
similarity  scaling  factor.  In  Eqs.  j[T6|)-([T9|)  this  extra 
degree  of  freedom  is  captured  via  the  parameter  a  that 
scales  the  original  entries  of  the  dissimilarity  matrix.  The 
dependency  of  the  embedding  error  of  MDS-PD  on  the 
dissimilarity  scaling  factor  a  varies  with  the  type  of  input 


data  and  is  investigated  in  more  detail  in  Section  IV 
IV.  MDS-PD:  EXPERIMENTAL  RESULTS 


Following  the  specification  in  the  previous  sections, 
we  successfully  implemented  MDS-PD  with  the  error 
functions  (fT7j)-([T9|).  In  this  section  we  show  a  few 
illustrative  results  of  our  experimental  study  using  MDS- 
PD  on  synthetic  as  well  as  real-world  data.  Some  of 
the  methods  we  used  to  verify  the  correctness  of  our 
specification  are  also  discussed  below. 


A.  An  Illustrative  Example 

As  a  first  example,  we  used  a  random  configuration 
of  seven  points  in  the  Poincare  disk.  We  populated  the 
input  dissimilarity  matrix  with  the  hyperbolic  inter-point 
distances  and  started  MDS-PD  from  another  randomly- 
generated  seven  point  initial  configuration  in  the  PD.  Fig. 
[5]  shows  the  trajectories  traveled  by  the  points  during 
the  minimization.  The  clear  points  denote  the  initial 
configuration  and  the  solid  points  represent  the  final 
point  configuration.  Fig.  [6]  shows  the  MDS-PD  internal 
parameters  vs.  the  iteration  number  for  this  example:  In 
Fig.  [6ji,  the  embedding  error  E  monotonically  decreases 
with  every  iteration;  the  iterations  terminated  with  the 
E  <  £e  =  1 0  6  condition,  which  means  that  likely  the 
output  configuration  represents  the  global  minimum  and 
the  final  inter-point  distances  match  the  input  dissim¬ 
ilarities  very  closely.  The  step-size  parameter  r  was 
initialized  with  a  value  of  1  and  assumed  only  values 
of  the  form  2k,  for  integral  k  (Fig.  [hji).  The  exponential 
character  of  the  change  of  r  according  as  ©1}  and 
©2}  (Fig.  |§  ensures  the  low  computational  cost  of 
the  line  search  subprogram  and  in  our  numerical  studies 
proved  superior  to  other  line-search  strategies,  including 
exact  search  or  adaptive  approximate  step-size  parameter. 
The  refining  of  the  step  size  as  the  current  configuration 
approaches  a  local  minimum  of  the  error  function,  on  the 
other  hand,  is  achieved  by  the  decrease  of  the  gradient 
norm.  This  is  further  illustrated  in  Figs.  [6];  and  [6]J. 

B.  Scaling  of  Synthetic  Data 

To  investigate  the  dependency  of  the  embedding  error 
on  the  dissimilarity  scaling  factor  a,  we  used  as  input  the 
inter-point  distances  obtained  from  random  sets  of  points 


Figure  5.  The  minimization  trajectory  for  a  seven  point  configura¬ 
tion  using  MDS-PD.  The  clear  and  the  solid  points  are  respectively 
the  initial  and  the  final  point  configuration. 

(a)  Embedding  error  E  (b)  Step-size  parameter  r 


Figure  6.  The  MDS-PD  internal  parameters  vs.  the  iteration  number 
for  the  seven  point  example  of  Fig.  [5]  (a)  the  embedding  error  E ,  (b) 
the  step-size  parameter  r,  (c)  the  norm  of  the  gradient  HgH^,  and  (d) 
the  step-size  parameter  relative  to  the  maximum  allowed  value  r/ria- 

residing  on  surfaces  of  constant  positive,  zero  or  negative 
curvature  (i.e.  respectively  a  sphere,  a  Euclidean  plane 
and  the  Poincare  disk  model  of  the  hyperbolic  plane.) 
The  corresponding  distances  (spherical,  Euclidean  and 
hyperbolic)  for  all  pairs  were  used  as  dissimilarities 
in  this  experiment.  The  embedding  error  function  was 
the  Sammon  criterion  m  We  also  used  noisy  inputs 
obtained  by  replacing  each  original  dissimilarity  8jk 


with  a  random  number  uniformly  distributed  in  the 
interval  [(1  —em)8jk,  (1  +  em)8jic]  for  a  chosen  noise 
level  em  <  1. 

Fig.[7]shows  the  typical  effects  of  dissimilarity  scaling 
for  several  Euclidean,  spherical,  and  hyperbolic  graphs. 
Cases  (a),  (c),  and  (e)  illustrate  the  variation  of  the  em¬ 
bedding  error  for  noiseless  input  data,  with  the  number 
of  points  as  a  parameter  (20  and  60  points.)  Cases  (b), 
(d),  and  (f)  illustrate  the  variation  of  the  embedding  error 
for  noisy  input  data  and  are  parametrized  by  the  amount 
of  measurement  noise  ( em  =  0, 10,20,30%.)  Each  point 
in  the  diagrams  (a)-(f)  was  obtained  as  a  minimum 
Sammon  stress  in  a  series  of  70  replicates  of  MDS-PD 
for  different  randomly  chosen  initial  configuration  in  the 
PD.  The  smoothness  of  the  obtained  curves  demonstrates 
that  for  the  chosen  problems,  this  number  of  replicates 
was  enough  to  approach  the  global  minimum  achievable 
embedding  error  for  each  simulated  value  of  a.  The 
results  are  drawn  on  semilogarithmic  axes  in  order  to 
show  more  details  toward  small  a  values. 

Locally,  the  Poincare  disk  model,  distance-wise  “looks 
like”  the  Euclidean  plane  scaled  with  some  constant 
factor.  For  example,  in  the  vicinity  of  a  point  zo,  the 
hyperbolic  distance  formula  (|2j)  becomes  d®(zj,Zk)  ~ 

| Zj  —  Zk |  -2/(1  —  |zo|2)-  Therefore,  for  a  sufficiently  small 
scaling  factor  a  and  sufficiently  many  replicates,  metric 
MDS  implementations  for  the  PD  model  and  for  the 
Euclidean  plane  using  the  same  scale-invariant  (for  Eu¬ 
clidean  distances)  error  function,  should  return  approx¬ 
imately  equal  embedding  errors  for  the  final  configura¬ 
tions.  (A  sufficiently  small  value  of  a  is  one  that  would 
make  the  final  configuration  land  in  a  sufficiently  small 
neighborhood  of  a  point  in  the  PD.)  In  this  sense,  MDS- 
PD  is  a  generalization  of  an  Euclidean  MDS  algorithm. 
We  used  these  observations  to  verify  that  our  MDS- 
PD  implementation  returned  the  expected  error  values 
for  small  scaling  factors.  Indeed,  as  Fig.  [7]  shows,  for 
small  a  values,  the  Euclidean  graphs  were  embeddable 
with  no  error,  and  the  other  two  graph  types  had  stress 
that  numerically  matched  the  output  of  other  available 
Euclidean  MDS  implementations  using  the  Sammon 
stress  criterion. 

In  the  cases  (e)  and  (f),  the  original  configurations 
are  residing  on  a  hyperbolic  plane,  and  therefore  are 
embeddable  with  zero  stress  in  the  PD  model  for  some 
value  of  a  (a  =  1  in  this  synthetic  example).  For  this 
value,  our  implementation  of  MDS-PD  was  able  to 
find  the  original  configuration  up  to  hyperbolic -distance 
preserving  Mobius  transformations.  The  existence  of 
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Planar 

Euclidean 


(c) 


(d) 


Figure  7.  Effects  of  dissimilarity  scaling  for  Euclidean,  spherical, 
and  hyperbolic  graphs.  Figures  (a),  (c),  and  (e)  show  the  variation 
of  the  embedding  error  for  noiseless  input  data,  with  the  number  of 
nodes  as  a  parameter.  Figures  (b),  (d),  and  (f)  show  the  variation  of 
the  embedding  error  for  noisy  input  data  parametrized  by  the  amount 
of  measurement  noise. 


graphs  with  a  hyperbolic  “underlying  structure”  that  are 
embeddable  with  notably  lower  stress  in  2-dimensional 
hyperbolic  than  in  2-dimensional  Euclidean  space  is  an 
important  illustration  of  the  usefulness  of  MDS  in  the 
Poincare  disk.  Candidate  graphs  having  such  hyperbolic- 
like  structure  are  real-world  communication  or  social 
networks  that  tend  to  have  a  strongly  interconnected  core 
and  a  sparser  periphery  of  tendrils.  MDS-PD  can  be 
used  in  such  contexts  to  investigate  the  “hyperbolicity” 
of  the  input  data  and  arrive  at  lower  stress  dissimi¬ 
larity  embedding.  The  diagrams  (b),  (d),  and  (f)  (Fig. 
[7])  also  demonstrate  that  relatively  high  noise  levels 
in  the  measured  data  do  not  significantly  change  the 
suitability  for  embedding  in  the  PD  in  the  cases  when 
the  original  dissimilarity  matrix  has  a  natural  underlying 
2-dimensional  space. 


C.  Real-World  Graph  Examples 

To  further  demonstrate  the  ability  of  the  Poincare  disk 
to  accommodate  lower  stress  2-dimensional  embeddings 
than  classical  Euclidean  MDS  for  certain  graph  types,  we 
applied  MDS-PD  to  dissimilarity  matrices  obtained  from 
several  real-world  datasets.  In  this  section  we  summarize 
the  results. 

1)  The  Iris  Dataset:  As  a  first  experiment,  we  ap¬ 


plied  MDS-PD  to  the  Iris  dataset  (Anderson  1935 1. 
This  classical  dataset  consists  of  150  4-dimensional 
points  from  which  we  extracted  the  Euclidean  inter¬ 
point  distances  and  used  them  as  input  dissimilarities. 
The  embedding  error  as  a  function  of  the  scaling  factor 
a  is  shown  in  Fig.  [8]  Each  value  in  the  diagram  is 
obtained  as  a  minimum  embedding  error  in  a  series 
of  100  replicates  starting  from  randomly  chosen  initial 
configurations.  Minimal  embedding  error  overall  was 
achieved  for  a  «  4.  The  improvement  with  respect  to 
the  2-dimensional  Euclidean  case  was  10%.  The  Iris 
dataset  is  an  example  of  dimensionality  reduction  of 
an  original  higher-dimensional  dataset  that  can  be  done 
more  successfully  using  the  PD  model. 

2)  Political  Books:  An  interesting  network  was  pre¬ 


sented  by  Krebs  (2008),  who  assembled  a  connectivity 
graph  of  political  books  frequently  bought  together  dur¬ 
ing  an  election  campaign.  In  the  graph  version  we  used, 
there  were  105  nodes  representing  books  and  a  total 
of  441  undirected,  unweighted  links  between  books  that 
were  frequently  bought  together.  We  obtained  dissimilar¬ 
ities  by  assigning  self-dissimilarity  of  0,  dissimilarity  of 
1  for  co-purchased  books  and  a  missing  (unknown)  dis¬ 
similarity  for  the  remaining  pairs  and  applied  MDS-PD 
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Hyperbolic  Scaling  of  the  Iris  Dataset 


Figure  8.  The  effect  of  scaling  of  the  dissimilarities  on  the 
embedding  error  for  the  Iris  Dataset  ljAnderson[|1935|.  The  input 
dissimilarities  are  the  Euclidean  distances  between  pairs  of  original 
points.  This  MDS-PD  result  reveals  that  the  Iris  dataset  is  better  suited 
for  embedding  to  the  hyperbolic  plane  that  to  the  Euclidean  plane. 


A  Citation  Network 


Figure  10.  The  embedding  error  as  a  function  of  the  input  scaling 
factor  for  the  network  of  citations  fN  ewman  pool).  The  network  is 
an  example  of  weighted  graph  data  that  can  be  encoded  with  lower 
embedding  error  in  the  PD  model  than  in  the  Euclidean  plane.  The 
input  dissimilarities  are  capturing  the  strength  of  the  collaboration 
between  pairs  of  authors. 


The  Political  Books  Network 


Figure  9.  The  embedding  error  as  a  function  of  the  input  scaling 
factor  for  the  network  of  political  books  (|Krebs|[2008j).  The  input 
dissimilarities  are  simply  indicators  of  the  presence  or  absence  of 
links  between  the  network  nodes.  The  political  books  network  is  an 
example  of  unweighted,  undirected  real-world  graph  data  that  can  be 
embedded  with  lower  error  in  the  PD  model  than  in  the  Euclidean 
plane. 


to  the  resulting  dissimilarity  matrix.  We  also  conducted 
the  experiment  using  only  the  liberal  and  the  conser¬ 
vative  subgraphs  (43  and  49  points  respectively).  The 
obtained  minimum  embedding  error  of  150  replicates 
as  a  function  of  the  scaling  factor  a  is  shown  in  Fig. 
[9]  We  note  that  there  were  remarkable  gains  of  using 
the  PD  model  instead  of  the  Euclidean  plane.  For  the 
overall  graph,  the  minimal  stress  was  7.6  times  smaller 
than  the  Euclidean  stress.  The  liberal  and  conservative 
components  alone  achieved  improvement  of  8.8  and  9 
times  with  respect  to  the  Euclidean  case. 


3)  Citation  Network:  The  citation  network  that  we 
used  in  this  experiment  was  compiled  by  Newman  (2001 1 
from  bibliographies  of  review  articles  on  networking. 
We  extracted  the  largest  connected  component  from  the 
graph  which  consisted  of  379  nodes  representing  authors. 
There  were  244  edges  in  the  graph  with  weights  sjk 
representing  the  strength  of  the  collaborative  ties.  We 
have  calculated  dissimilarities  from  these  data  using 
8jk  =  const  —  Sjk  and  applied  the  MDS-PD  algorithm. 
The  obtained  minimum  embedding  error  of  50  replicates 
as  a  function  of  the  scaling  factor  a  is  shown  in  Fig.  [TO] 
The  overall  minimum  embedding  error  was  2.63  times 
lower  than  the  stress  obtained  using  Euclidean  MDS. 


Newman  (2001 


on  networking 


V.  RELATED  WORK 


A  multidimensional  scaling  algorithm  for  fitting  dis¬ 
tances  to  constant-curvature  Riemannian  spaces  was 
given  by  Lindman  and  CaeTE]  (1978]).  This  work  uses 
the  hyperboloid  model  of  the  hyperbolic  space  which  re¬ 
quires  an  n+  1 -dimensional  Euclidean  space  to  represent 
an  77-dimensional  hyperbolic  space,  and  is  less  suitable 
for  visualization  purposes  in  the  case  of  the  hyperbolic 
plane. 

The  applicability  of  metric  multidimensional  scaling 
in  the  Poincare  disk  model  of  the  hyperbolic  plane 
was  studied  by  Walter  (2004|).  The  study  focused  on 
the  task  of  embedding  higher-dimensional  point  sets 
into  2-dimensional  configurations  for  the  purpose  of 
interactive  visualization.  It  was  demonstrated  that  the  PD 
has  capacity  to  accommodate  lower  stress  embedding 
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Figure  11.  Comparison  of  the  point  trajectories:  H-MDS  of  J.  Walter 
|Walter|  <f2004|l  (left)  vs.  MDS-PD  hyperbolic  lines  (right) 


than  the  Euclidean  plane.  Several  important  pointers  to 
the  difficulties  one  encounters  in  implementing  such 
algorithms  were  given,  but  a  definite  specification  or 
implementation  was  not  provided. 

The  proposed  method  to  convert  the  seemingly  con¬ 
strained  optimization  problem  to  an  unconstrained  one 
( Waltcr|2004[  Eq.  12)  ensures  that  the  moving  configura¬ 
tion  would  stay  inside  the  model  during  the  optimization. 
However,  this  transformation  fails  to  follow  the  distance 
realizing  (hyperbolic)  lines,  or  even  Euclidean  lines.  The 
problem  is  illustrated  in  Fig.  m  The  possibility  that 
the  dissimilarity  matrix  has  missing  values  was  also 
not  addressed  in  this  work,  as  the  dissimilarities  were 
generated  from  higher-dimensional  points.  Inputs  derived 
from  graph  data,  however,  are  typically  sparse. 

|Shavitt  and  TankeT  ( 2008 1  presented  numerical  exam¬ 
ples  illustrating  that  models  of  the  hyperbolic  space  with 
circular  symmetry  may  be  better  suited  than  Euclidean 
spaces  for  embedding  network  graphs  with  core-and- 
tendrils  structure.  Their  work  concentrated  on  applica¬ 
tions  specific  to  communication  networks  where  dissim¬ 
ilarities  for  each  pair  of  points  are  derived  from  the 
lengths  of  the  shortest  paths  in  the  graph.  For  such 
applications,  the  authors’  insight  about  the  choice  of  the 
Poincare  disk  model  was  that  the  shortest  paths  in  the 
studied  networks  often  pass  through  the  core  and  are 
therefore  longer  than  the  straight-line  distance,  and  this 
observation  empirically  matches  the  behavior  of  distance 
function  in  the  chosen  model.  In  order  to  avoid  the 
constrained  nature  of  the  coordinates  in  the  PD,  the 
authors  eventually  resorted  to  the  hyperboloid  model  of 
the  hyperbolic  plane,  omitting  the  details. 

The  “big-bang  simulation”  (BBS)  numerical  method 
used  in  ([Shavitt  and  Tankelj  2008[),  is  discussed  in 
(Shavitt  and  Tankel|2004 1.  BBS  is  a  variant  of  a  steepest 
descent  method  that  models  the  point  configuration  as  an 


inertial  system  in  a  force-generating  field.  Termination 
is  guaranteed  by  introducing  empirical  dampening  in 
the  mechanical  system.  The  initial  configuration  in  BBS 
is  always  chosen  to  be  a  single  point  in  which  all 
particles  are  collocated,  ensuring  a  fair  initial  amount 
of  potential  energy.  Another  heuristic  feature  of  BBS 
is  that  the  objective  function  changes  several  times 
during  the  minimization  in  a  way  that  increases  the  error 
sensitivity  of  the  penalty  terms.  The  particle  inertia  in 
BBS  in  conjunction  with  a  stepwise  changing  objective 
function  possibly  allows  the  method  to  escape  a  few  local 
minima  before  termination.  However,  the  advantages  of 
these  heuristics  in  avoiding  local  minima,  compared  to  a 
computationally  simpler,  single  phase  minimization  pro¬ 
cedure,  were  not  clearly  demonstrated.  It  is  conceivable 
that  the  inertial  minimum-avoiding  mechanism,  which 
comes  at  an  increased  computational  cost,  may  as  well 
cause  the  configuration  to  leave  the  global  minimum,  or 
a  lower  local  minimum  before  stopping  in  a  higher  one. 
Finally,  since  BBS  can  only  be  started  from  one  possible 
initial  configuration,  it  has  a  deterministic  outcome  once 
the  heuristic  parameters  such  as  friction  and  time  slice 
are  chosen;  with  this  choice,  the  possibility  that  the  final 
result  is  improved  by  restarting  from  different  initial 
conditions,  is  eliminated. 

Numerous  methods  that  are  more  likely  to  find  a  lower 
minimum  than  the  simplest  repeated  descent  methods  in 
a  single  run  have  been  contemplated  in  the  numerical 
optimization  literature.  However,  to  guarantee  in  general 
that  the  global  minimizer  is  found  is  difficult  with  any 
such  method.  It  may  be  necessary  to  resort  to  running  the 
sophisticated  methods  several  times  as  well  in  order  to 
gain  confidence  in  the  final  result.  Since  these  methods 
are  usually  computationally  more  complex  or  incorporate 
a  larger  number  of  heuristic  parameters,  the  incurred 
computational  and  implementational  costs  often  offset 
the  benefits  of  their  sophistication. 

VI.  CONCLUSION 

We  developed  the  details  of  MDS-PD,  an  iterative 
minimization  method  for  metric  multidimensional  scal¬ 
ing  of  dissimilarity  data  in  the  Poincare  disk  model  of  the 
hyperbolic  plane.  While  our  exposition  concentrated  on 
a  simple  steepest  descent  minimization  with  approximate 
binary  hyperbolic  line  search,  we  believe  that  elements 
of  the  presented  material  will  also  be  useful  as  a  general 
recipe  for  transferring  other,  more  sophisticated  iterative 
methods  of  unconstrained  optimization  to  various  models 
of  the  hyperbolic  space. 
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We  believe  that  the  algorithm  specification  herein 
allows  for  easy  implementation  and  represents  a  useful 
novel  tool  for  the  scientific  community  and  that  its 
application  will  enable  a  wide  range  of  scientific  inquiry 
spanning  beyond  the  area  of  numerical  optimization 
methods. 

Our  initial  numerical  experiments  using  both  syn¬ 
thetic  and  real-world  data  suggest  that  MDS-PD  achieves 
significantly  better  results  compared  to  its  Euclidean 
counterpart  for  naturally  arising  networks  of  interaction. 
This  behavior  may  ultimately  be  attributed  to  the  less- 
restricted  axiomatic  foundation  of  the  hyperbolic  geom¬ 
etry  compared  to  its  Euclidean  counterpart.  Our  results 
justify  the  development  effort  and  encourage  future  work 
in  the  directions  of  generalizing  this  algorithm  to  higher¬ 
dimensional  models  of  the  hyperbolic  space,  improving 
its  efficiency,  and  establishing  novel  uses  and  applica¬ 
tions. 
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